Closing The Back Doors
The Effect, Chapter 8: Closing Back Doors
Moore & Siegel,,) Chapter 12. Ignore sections 12.3.6 and 12.3.7 on computing determinants and inverses by hand. We have computers that do that for us now.
This week, we discuss the linear model as a general purpose tool for conditioning on confounding variables and recovering causal estimands. We will highlight the strengths and limitations of this approach, and show how to estimate the model’s parameters using matrix algebra.

Some three-dimensional plots illustrating the “plane of best fit”.
In class, we showed that GDP per capita was partly confounding
the observed relationship between democracy and corruption. Can you
think of any other back door paths that need to be closed? What variable
would you need to measure and condition upon to close that path? Try
your hand at finding a dataset with that variable, merging it with the
country-level corruption dataset we created in
R/week-09/cleanup-data.R, and adding it to the linear
model.
Cohn et al. (2019) left wallets
in cities around the world to see how many of them would be returned.
It’s a very ambitious study. You can read about it here
and you can find the replication data in our repository at
data/cohn-2019/. You want the “behavioral data”; check out
the codebook to see what all the variables mean. The experimenters
varied a few things at random, including the type of institution they
left it at (public, bank, etc.) and how much money was in the wallet.
Your task is this: I want to know what fraction of wallets were returned
in each country when they were left at public institutions (this could
be another measure of public corruption; if bureaucrats just tend to
steal wallets instead of returning them). Are government officials less
likely to steal wallets in countries that have been democracies for
longer periods of time (c_PIV_years_democracy)? What’s the
slope of that relationship? And can you interpret that relationship as
causal? Why or why not?
Optional Challenge: Replicate Figure 1 in the
Cohn et al. (2019) paper using
ggplot.